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Abstract. In this paper we model several simple biochemical operations 
on RNA molecules that modify their secondary structure by means of a 
suitable variation of Grofie-Rhode's Algebra Transformation Systems. 



1 Introduction 

Biochemical processes are responsible for most of the information processing that 
takes place inside the cell. In the recent years, several representations and sim- 
ulations of specific biochemical processes have been proposed using well known 
rewriting formalisms borrowed from Theoretical Computer Science. Let us men- 
tion, for instance, Fontana's lambda calculus chemistry |3I4| . recently revised by 
Miiller [5] (for a recent survey on artificial chemistry, see [Tl|), the stochastic 
Petri net approach the 7r-calculus representation of biochemical processes 
carried out by networks of proteins jlUj , and the graph replacement approach to 
DNA operations [5]. In the latter, an ad hoc graph replacement formalism is de- 
veloped to formalize DNA biochemical operations like annealing or denaturing, 
by considering DNA double strands to be special graphs. 

There is another popular line of research in theoretical biochemistry that 
aims to represent the three-dimensional structure of biopolymers, and specially 
of DNA and RNA, by means of different kinds of formal grammars; see, for 
instance, |7I13| for two surveys on this topic. The ultimate goal of such a repre- 
sentation is to understand how the three-dimensional structure of a biopolymer 
is determined from its sequence of monomers (for instance, how the sequence 
of ribonucleotides of an RNA molecule determines its secondary structure; see 
below for the relevant details of RNA's biochemistry), and how this structure 
evolves when the biopolymer is modified through biochemical processes. 

Sooner or later, this two lines of research should intersect, and the main 
goal of this paper is to move these two lines of research a step closer. We for- 
malize some simple biochemical processes on RNA molecules, like for instance 
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ribonucleotide removals or mutations, and their effect on their three-dimensional 
structures by means of a variant of Grofic Rhode's Algebra Transformation Sys- 
tems (ATS) on partial algebras representing RNA biomolecules. 

Before entering into more details, it is time to introduce a little biochemistry. 
As probably everybody knows, RNA molecules, together with DNA molecules 
and proteins, form the molecular basis of life. An RNA molecule can be viewed 
as a chain of ribonucleotides, and each ribonucleotide is characterized by the 
base attached to it, which can be either adenine (A), cytosine (G), guanine 
(G) or uracil (U). An RNA molecule is uniquely determined by the sequence of 
bases along its chain, and it has a definite orientation. Such an oriented chain of 
ribonucleotides is called the primary structure of the RNA molecule. 

In the cell and in vitro, each RNA molecule folds into a specific three- 
dimensional structure that determines its biochemical activity. To determine this 
structure from the primary structure of the molecule is one of the main open 
problems in computational biology, and partial solutions have been proposed 
using Stochastic Context Free Grammars and Dynamic Programming, among 
other tools; see, for instance, |2J Chaps. 9, 10]. 

This three-dimensional structure is held together by weak interactions called 
hydrogen bonds between pairs of non-consecutive bases. 1 Almost all these bonds 
form between complementary bases, i.e., between A and U and between C and 
G, but other pairings do also occur sporadically. For simplicity, in this paper we 
shall only consider pairings between complementary bases. 

In most representations of RNA molecules, the detailed description of their 
three-dimensional structure is overlooked and the attention is focused on its sec- 
ondary structure: the set of its base pairs, or contacts. Secondary structures are 
actually a simplified representation of RNA molecules' three-dimensional struc- 
ture, but that is enough in some applications, as different levels of "graining" 
are suitable for different problems. Two restrictions are usually added to the 
definition of secondary structure: 

— If two bases hi and bj are paired, then neither bi or bj can bond with any 
other base; this restriction is called the unique bonds condition. 

— If contacts exist between bases bi and bj and between bases bk and bi, and if 
bk lies between bi and bj, then bi also lies between bi and bj; this restriction 
is called the no-pseudoknots condition. 

The unique bonds condition simply captures the fact that the "bond" between 
two consecutive bases is of different nature, as a part of the molecule's back- 
bone. The no-pseudoknots condition is usually added in order to enable the use 
of dynamic programming methods to predict RNA secondary structures and, 
although real three-dimensional RNA structures have (pseudo)knots, we impose 
it here to show the scope of our approach: if pseudoknots are allowed, one simply 
has to allow them in the algebraic representation of RNA secondary structures 

1 Actually, for a hydrogen bond to be stable, the bases involved in it must be several 
nucleotides apart, but for simplicity we shall only consider the restriction that they 
must be non-consecutive. 



and to remove the corresponding production rules from the rewriting system. 
Thus, 

This allows traditionally to represent an RNA molecule as a labelled graph, 
with nodes representing the ribonucleotides, and their labels denoting the bases 
attached to them, and arcs of two different kinds: ones representing the order 
of the bases in the primary structure (the backbone) and the rest representing 
the bonds that form the secondary structure (the contacts) |11I12| . Our repre- 
sentation is slightly different: the backbone is represented by a partial algebra 
corresponding, essentially, to a labelled finite chain, and then the contacts are 
specified as arcs of a graph on the nodes of the backbone. 

There are some biochemical operations that can be carried out on an RNA 
molecule. For instance, a ribonucleotide can be added or removed somewhere in 
the primary structure, a contact can form between two complementary bases, or 
it can be removed, and a base can mutate into another base. These operations 
may have collateral effects: for instance, if a base mutates into another one and it 
was involved in a contact, then this contact will disappear, as the corresponding 
bases will no longer be complementary, and if a nucleotide is removed and as 
a consequence two nucleotides forming a contact become consecutive, then this 
contact will also break. 

It is precisely when we tried to specify these side effects that we were not 
able to use simple graph transformation systems in an easy way, and we decided 
to use the ATS approach. ATS is a very powerful algebra rewriting formalism, 
introduced by M. Grofie-Rhode in 1999 in order to specify the behavior of com- 
plex states software systems. It is operationally described, but not categorically 
formalized, and it takes care of side effects of the application of rules, similar 
to those found in our work. Unfortunately, even the ATS formalism as defined 
in 0, which was designed with software engineering specification applications 
in mind, was not suitable, as it stands, for our purposes. Thus we have slightly 
modified a simplified version of it, and we have dubbed the resulting formal- 
ism Withdrawal-based Algebra Transformation Systems (WATS). The reason is 
that, in our approach, the inconsistencies are eliminated by retreating, i.e., by 
removing, in a controlled way, the elements and operations that produce them, 
while in the original ATS approach the inconsistencies were eliminated by adding 
operations and identifying points. 

The rest of this paper is organized as follows. In Section 2 we represent RNA 
molecules as suitable partial algebras, in Section 3 we briefly introduce the WATS 
formalism, and then in Section 4 we show how to represent the aforementioned 
biochemical operations on RNA molecules by means of WATS production rules. 
A final section on Conclusions closes the paper. 

2 RNA Molecules as Partial Algebras 

Roughly speaking, we represent the primary structure of an RNA molecule as a 
chain n of length n 6 N with a label in {A, C, G, U} attached to each element of 



the chain, representing the base attached to the corresponding ribonucleotide, 
and its secondary structure by means of ordered pairs in n x n. 
Let S ps be the following many-sorted signature: 

Sorts : Nat, Bases 
Opns : sue : Nat — > Nat 

First, Last :— > Nat 

A,C,G,U Bases 

minor : Nat, Nat — > Nat 

label : Nat — ► Bases 

k : Bases — > Bases 

An i? JV.A primary structure is a finite partial Z^-algebra 

P = (PNat, PBases', 

First p , Last p , A p , C p , G p , U p , suc p , minor p , label p , k p ) 

such that: 

i) (PNat', First p , Last p , suc p ) is a chain with successor operation suc p , first 
element First p and last element Last p . 

ii) The operation minor p models the strict minority relation on this chain: 
minor p (x,y) = x if and only if there exists some n > 1 such that y = 
(suc p ) n (x). 

iii) The values of the miliary operations A p , C p , G p , U p are pairwise different, 
Psases = {A p ,C P ,G P ,U P }, and on this set the operation k p is given by 
the involution 

k p {A p ) = U p , k p {U p ) = A p , n p (C p ) = G p , n p (G p ) = C p . 

iv) The operation label p is total. 

Notice that all these conditions except the last one cannot be specified through 
quasi-equations, since they are not satisfied by a trivial (with only one element 
of each sort) total J7 ps -algebra. 

Let S ss be now the signature containing S ps and, in addition, the following 
sorts and operation symbols: 

Sorts : Contacts 
Opns : pi : Contacts — ► Nat 
P2 ■ Contacts — > Nat 

An RNA secondary structure is a partial Z^-algebra 

B = (BN a t,BB asesi -^Contacts') 

First 5 , Last 5 , A 3 ,C B , G B ,U B , suc B , minor B , label B , k b , pf , pf) 



whose Z'ps-reduct is an RNA primary structure and it satisfies moreover the 
following quasi-equations: 

(1) pf and pf are total 

(2) pf (as) =pf(y) ^x = y 

(3) pf{x)=pf(y)^x = y 

(4) pf (x) =pf(y) =>x = y 

(5) minor(succ B (pf(x)),pf(x)) = succ B (pf(x)) 

(6) minor(pf (x),pf (y)) = pf(x) A minor (pf (y) , pf (x)) = pf(y) 

=> minor (pf(y),pf (x)) = pf(y) 

(7) n B (label B (pf (a))) = ZabeZ 3 ^^)) 

In such an RNA secondary structure, each element c of sort Contacts repre- 
sents, of course, a contact between nucleotides pf (c) and f>!f (c). Equations (2), 
(3) and (4) represent the unique bonds condition, equation (5) represents the fact 
that there cannot exist a contact between a nucleotide and itself or its successor 
in the primary structure, equation (6) represents the no-pseudoknots condition, 
and equation (7) represents the fact that a contact can only pair complementary 
bases. Notice that, if we simply omit equation (6) then, pseudoknots are allowed 
in the representation of RNA molecules. 

Let r ss = (E ss , CE) be the specification whose set of consistence equations 
CE are the quasi-equations (1) to (7) above. Let Alg rss the category whose ob- 
jects are all partial r ss -algebras, i.e., those partial Z^- algebras satisfying equa- 
tion (1) to (7), and the morphisms between them are the plain homomorphisms, 
and let A\g RNA be the full subcategory of Alg rss supported on the RNA sec- 
ondary structures. 

3 Withdrawal-based 17 ss -Algebra Transformation 
Systems 

Our Withdrawal-based Algebra Transformation Systems (WATS) are a modi- 
fication of a simplified version of the Algebra Transformation Systems (ATS) 
introduced in j^j . This modification only affects the last step in the definition of 
the application of a rewriting rule through a matching, and therefore all defini- 
tions previous to that one are the same as in the original ATS formalism. Since 
we are only interested in rewriting RNA secondary structures, we shall only give 
the main definitions for the signature S ss introduced in the previous section. 

So, to simplify the notations, let us denote by S, fl and r\ the set of sorts, the 
set of operation symbols and the arity function of the signature S ss . For every 
(p 6 fl, set n(tp) = (ui((p),a(ip)) G S* x S. 

A E ss -presentation is a pair P = (Ps,Pe) where Ps = (P s ) s <es is an 5- 
set, whose elements will be called generators, and Pe is a set of equations with 
variables in Ps 

t = t', t,t' 6 T Sss (P s ) s , s G S. 



A special type of equations are the function entries, of the form 
<p(a) = b, ip e Q, ae Pg [ip) , be P a{v) . 

A presentation is functional when all its equations are function entries, and a 
functional presentation is consistently functional when for every ip e fi and 
a e Pg^, there is at most one function entry of the form ip(a) = b in Pe- 

Let p : Ps — > P' s be a mapping of S'-sets. If e is an equation t = t' with 
t,t' E Ts ss (Ps)s, then we shall denote by e[p] the equation t(p) = t'(p) where 
t(p),t'(p) <E r Ts ss (P s )s are the terms obtained from t and t' , respectively, by 
replacing all variables in them by their corresponding images under p. In partic- 
ular, if e is the function entry (p(a) — b, then e[p] stands for the function entry 
<p{p(h)) = Pcr( v ){b)- Given a mapping of S-scts p : P s — > P' s and any set E of 
equations with variables in Ps, let 

E\p] = {e[p\ \eeE}. 

A morphism of JC ss -presentations p : (Ps, Pe) — > (-Pg, -Pg) is then a mapping of 
S'-sets p:P s ^ P's such that P E [p] CPJ,. 

A S ss - rewriting rule is a pair of X^s-presentations, written r = (Pi < > P r ), 

where Pi = (Xi,E{) and P r = (X r ,E r ) are functional presentations. Informally, 
the left-hand side presentation in such a rule specifies the elements and op- 
erations that must be removed from the algebra which the rule is applied to, 
while its right-hand side presentation specifies the elements and operations to 
be added. The generators that occur in a rule play the role of variables (and 
therefore we shall usually call them variables): those appearing in the left-hand 
side presentation must be matched into the algebra to rewrite, and those ap- 
pearing in the right-hand side presentation must be matched into the resulting 
algebra, in such a way that if a variable occurs in both parts of a rule, its image 
must be preserved. In the sequel, we shall assume that all variables that occur 
in rewriting rules are taken from a universal S-set X that is globally fixed and 
disjoint from all sets of operation symbols in the signatures we use. We shall also 
assume that X is large enough to contain equipotent copies of the carrier sets 
of all algebras we are interested in. 

For every Z' s;s -rewriting rule r = (Pi < ► P r ), with Pi = (Xi,Ei) and P r = 

(X r ,E r ), let: 

X® = Xi — X r , X® = X r — Xi , 
Ei = Ei — E r , E® = E r — Ei. 

For every /^-algebra A = (A, ((p A ) v en ss )), let A s = A and 

A e = {<p(a,) = b | ip G ft ss , a € dom( / 9 A , if A (a) = b}. 

A match m for a Z^-rcwriting rule r = (Pi < ► P r ) in A is simply a presenta- 
tion morphism m : Pi — > (As, Ae)- The extension 



to* : Xi U X r -> A s U X° r 



of m is defined by 2 




m(x) G A s if x G Xi 

x e A r ° if x G A r ° = X r - Xi 



The application of r to A through m rewrites then A into the partial r ss - 
algebra B defined, step by step, as follows: 

1) Set Bs = (As — m(X®)) U X®. This step removes from A the elements that 
are images of elements in Xi that do no longer belong to X r , and adds to it 
the elements in X r that did not belong to Xi. 

2) Set Be = (Ae — E^[ra]) U E®[m*}. This step removes from A the operations 
that are images of function entries in E[ that do no longer belong to E r , and 
adds to it the equations in E r that did not belong to E t , with variables in 



3) Since the presentation (Bs, Be) is functional, it defines a partial .27 ss -algebra 
with carrier set B = Bs by simply translating the function entries in Be into 
operations; if this presentation is not consistently functional, then we must 
identify elements in B in order to remove inconsistencies. This step can be 
formally described by means of a functor left adjoint to a functor that sends 
every l? ss -algebra to its presentation (As,Ae)- 

4) If the Z 1 ,^- algebra defined in this way satisfies equations (1) to (7), we are 
done. Otherwise, there are two possibilities: 

— Every contact x G Bcontacts that violates equations (1), (5) or (7) is 



— After performing all removals in the previous step, if there are still pairs 
of contacts x,y G Bcontacts that violate equations (2), (3), (4) or (6), 
then, if one of them comes from X® and the other comes from As, the 
one from X° is removed and the other one preserved, and otherwise both 
are removed. 

It is in step (4) where the main difference between Grofie-Rhode's original 
ATS formalism and our WATS formalism lies. In ATS, the Z^- algebra obtained 
in (3) would be forced to satisfy equations (1) to (7) by taking its universal 
solution in Alg^ , and thus adding operations and identifying elements. In our 
formalism, violations of equations (1) to (7) are obviated by simply removing in 
a controlled way the contacts that yield them. 

4 Biochemical operations modelled by means of WATS 

The biochemical operations considered in this paper are the addition, deletion 
and mutation of a ribonucleotide and the addition and deletion of a contact. 
Each of these biochemical operations can be modelled as a rewriting step of a 

2 As always, we identify any set with its image into its disjoint union with any other 
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removed. 



set. 



WATS by the applications of a I7 ss -rewritmg rule to a RNA secondary structure. 
The rewriting rules that model these biochemical operations are the following 
ones. 

Adding a nucleotide: We have to consider three different cases, corresponding 
to adding the new nucleotide at the beginning of the chain, at the end, or 
in the middle of it. In this case, each rule must be understood as having a 
parameter x which corresponds to the base attached to the new nucleotide. 
So, there are four different values of this parameter, the miliary operation 
symbols A, U, C and G. 

- Rule P a dd-base-first{x) has: 

• as Pi the set of variables Xi — {ki} of sort Nat and the set of 
equations E\ = {First = k\}; 

• as P r the set of variables X r = {t, k\,ko}, of sorts t £ Bases and 
ki, kg £ Nat, and the set of equations E r = {First = fc , suc(ko) = 
ki,label(ko) =t,x = t}. 

- Rule Padd-base-last{x) has 

• as Pi the set of variables Xi = {k n } of sort Nat and the set of 
equations Ei = {Last — k n }; 

• as P r the set of variables X r = {t, k n+ i, k n }, of sorts t £ Bases and 
k n +i, k n £ Nat, and the set of equations E r = {Last = k n +i, suc{k n ) 
= k n+1 , label (k n+1 ) =t,x = t}. 

- Rule Padd-base-middle{x) has: 

• as Pi the set of variables Xi = {k i7 kj} of sort Nat and the set of 
equations Ei = {suc(ki) = kj}; 

• as P r the set of variables X r — {t, ki, kj, k}, of sorts t £ Bases and 
ki, kj,k £ Nat, and the set of equations E r = {suc{ki) = k, suc(k) = 
kj,label(k) = t,x = t}. 

Remove a nucleotide: We have to consider again three different cases, corre- 
sponding to removing the nucleotide at the beginning of the chain, at the 
end, or in the middle of it. 

- Rule P del-base- first has: 

• as Pi the set of variables Xi — {ki, ko}, both of sort Nat, and the 
set of equations Ei = {First = k , suc(k ) — ki}; 

• as P r the set of variables X r = {ki} of sort Nat and the set of 
equations E r = {First — ki}. 

- Rule Pdei-base-iast has: 

• as Pi the set of variables Xi — {k n -\, k n }, both of sort Nat, and the 
set of equations E\ = {Last — k n , suc{k n -i) = k n }; 

• as P r the set of variables X r — {k n -i} of sort Nat and the set of 
equations E r = {Last = fc„_i}. 

- Rule Pdel-base-middle has: 

• as Pi the set of variables Xi = {ki,kj,ki}, all of them of sort Nat, 
and the set of equations Ei = {suc(ki) = hi, suc(ki) — kj}; 

• as P r the set of variables X r — {ki,kj} of sort Nat and the set of 
equations E r = {suc(ki) = kj}. 



Mutating a base: The mutation of a base is specified by just redefining the 
operation label. Thus we consider the following rule: 

— Rule Pmutation has: 

• as Pi the set of variables X t — {x,y,k}, of sorts x,y G Bases and 
k G Nat, and the set of equations E[ — {label(k) — x}; 

• as P r the set of variables X r — {x,y, k}, of sorts x,y G Bases and 
k G Nat and the set of equations E r = {label(k) — y}. 

Adding a contact: To add a contact we simply add a new element of sort 
Contact and the projections from it to the nucleotides it bonds. 

— Rule Padd- contact has: 

• as Pi the set of variables Xi = {x,y,ki,ki + i,kj}, of sorts x,y G 
Bases and ki, fcj+i, kj G Nat, and the set of equations Ei = {suc(ki) 
= ki+i, minor (ki+i,kj) — ki+i,n{x) = y,n(y) = x,label(ki) = x, 
label(kj) = y}; 

• as P r the set of variables X r — {x,y,ki,ki + i,kj,c}, of sorts x,y G 
Bases, ki, fcj+i, fcj G A/ai and c G Contacts, and the set of equations 
E r = {suc(ki) = k l+1 , minor (k i+1 ,kj) = k l+1 ,n(x) = y,n(y) = 
x,pi(c) — ki,p2(c) — kj,label(ki) — x,label{kj) = y}. 

Remove a contact: To remove a contact we simply delete it. 

— Rule Pdd-contact has: 

• as Pi the set of variables Xi = {ki, kj, c}, of sorts fcj, kj G Nat and 
c G Contacts, and the set of equations Ei = {pi(c) — ki,p 2 {c) = kj}; 

• as P r the set of variables X r — {ki,kj}, both of sort Nat, and the 
set of equations E r = 0. 

It is not difficult to check that an RNA secondary structure is always rewrit- 
ten by the application of any one of these rules through any matching into an 
RNA secondary structure, and that in each case their effect is the desired one. 
This must be done rule by rule and case by case. 

5 Conclusion 

We have modelled several simple biochemical operations on RNA molecules that 
modify their secondary structure by means of rewriting rules in a modified ver- 
sion of the Algebra Transformation Systems of GroBc-Rhodc, which we have 
dubbed Withdrawal-based Algebra Transformation Systems. This modification 
has been made ad hoc for algebras representing RNA secondary structures, but 
we feel that the philosophy of removing inconsistencies by retreating should have 
applications in other contexts, and could probably be formalized for algebras over 
arbitrary specifications. 

In this paper we have made some simplifications on the RNA secondary 
structure that could perfectly be avoided. For instance, if we want to allow 
contacts between pairs of basis other than the usual complementary pairs, like 
for instance between G and U (they are called wobble pairs, not so uncommon), 
then we only have to replace the involution n by a symmetric relation on the 



carrier of sort Bases. And if we want to impose that two bases paired by a 
contact must be at least at a fixed distance, we only have to modify in a suitable 
way equation (5). 

There are also other collateral effects that could, and probably should, be 
specified. For instance, isolated contacts tend to break, and pseudoknots should 
be allowed under certain circumstances. 
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